COVSMA

pf_1617544710.jpg

Table of Contents

COVSMA's first tool: COVSCO

COVSMA stands for Copernicus Satellites Versus Maladies: The current sanitary crisis generates the necessity to develop an online tool to monitor pollution levels, display alerts that will imply that governments automatically take measures: days without cars and trucks that are not 100% Electrical, monitor the live impact of the measures taken, forecast COVID19 risk due to pm2.5 exposure for up to 4 days in the future, and predict new hospitalisations due to severe COVID19 cases for all states/departements. We have named this tool with the analog name of COVSCO (Copernicus Satellites Versus COVID19). We start with France and its 96 departements. A follow up will be to apply the same methodology to severe respiratory diseases and to expand the model and databases to a global scale.

Data Exploration

Introduction to the data: our X, our y

The features: 22 variables from which we will predict our target

The Target: New hospitalizations due to severe COVID19 cases

The daily number of new hospitalizations due to severe COVID19 cases for every French departement is what we will predict.

New hospitalizations means over all departements in function of 7davg pollutants concentrations differencials

New hospitalizations means over all departements in function of 1Mavg - Pollutants concentrations differencials

New hospitalizations means over all departements in function of 1MMax Pollutants concentrations differencials

Ozone (O3) and the number of severe COVID19 cases leading to hospitalization

Departement 75: Paris region Ile de France

Departement 83: Var region PACA

Nitrogen dioxide (NO2) and the number of severe COVID19 cases leading to hospitalization

Departement 75: Paris region Ile de France

Departement 83: Var region PACA

PM2.5 and the number of severe COVID19 cases leading to hospitalization

Departement 75: Paris region Ile de France

Departement 83: Var region PACA

CO and the number of severe COVID19 cases leading to hospitalization

Departement 75: Paris region Ile de France

The most polluted departements of France

The Ozone O3 Pollutant

The PM2.5 Pollutant

The NO2 Pollutant

The CO Pollutant

The PM10 Pollutant

The facebook mobility index

The facebook mobility index VS Ozone O3

The facebook mobility index VS Carbon Monoxyde CO

The facebook mobility index VS Nitrogen Dioxide NO2

The facebook mobility index VS PM2.5

The facebook mobility index VS PM10

Training the model - Gradient Boosting for regression

Gradient Boosting for regression.

GB builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions. In each stage a regression tree is fit on the negative gradient of the given loss function.

Stack of estimators with a final regressor.

Stacked generalization consists in stacking the output of individual estimator and use a regressor to compute the final prediction. Stacking allows to use the strength of each individual estimator by using their output as input of a final estimator.

Note that estimators_ are fitted on the full X while finalestimator is trained using cross-validated predictions of the base estimators using cross_val_predict.

Hold-out and Cross Validation (MSE/MAE)

Feature importance Report

FIRclass1.png

FIRclass2.png

Exporting the model to a joblib file

Running T-POT Auto ML optimization algorithm

Recurrent Neural Network

A classification of pollution levels, using the elbow method to get the optimal number of clusters

The KMeans elbow method

The elbow method determines that the optimal number of clusters for PM2.5 Levels is k = 4

Finding minima in the kernel density estimation to identify splitting points, and describe the resulting ranges as (min,max) intervals

Conclusion

Design & Development:

Verification of methodology

Covid Risk of severe COVID19 cases due to PM2.5 forecast maps